-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The build fails with Edit: It has been moved to backward codecs. Will use something more stable. |
|
I have a preliminary benchmark here (top-k=100, fanout=0) using Cohere 768 dataset. Anyhow I can see these 2 things that should be addressed:
|
| } | ||
| Weight weight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1.0f); | ||
| HitQueue queue = new HitQueue(k, false); | ||
| for (var leaf : reader.leaves()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be switched to parallel execution similar to AbstractKnnVectorQuery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I was using single-thread as a simple version and try to benchmark the latency first, since multi-thread could add some overhead as well. This class only does vector loading and similarity computation for a small set of vectors (k * oversample) so it's not as CPU-intensive as the AbstractKnnVectorQuery
I'll also try multi-thread and run the benchmark again. From the below benchmark, the re-ranking phase only adds a trivial amount of latency it might not help much. Also the benchmark code seems to force merge so there's only a single partition, we need to change so that there are multiple partitions.
|
Edit: My previous benchmark was wrong because the vectors are corrupted First benchmark show the recall improvement for each oversample with reranking. It now aligns with what was produced in #13651.
Second benchmark compare the latency across all algorithms. We are still adding only a small latency for the reranking phase.
Last benchmark, I just ran oversample without reranking, but still cutoff at original K (so they act similar to fanout). This is just to make sure that the reranking phase actually adds value. Expectedly, the recall does not improve much compared to the reranking.
NOTE: The dots in all benchmarks represent the oversample factor with values of 1, 1.5, 2, 3, 4, 5. Oversample of 1 means no over-sampling. See https://github.com/mikemccand/luceneutil/blob/main/src/main/knn/KnnGraphTester.java#L833-L834 |
|
Also this is the luceneutil branch I used for benchmarking: https://github.com/dungba88/luceneutil/tree/dungba88/two-phase-search, which incorporates the test for BQ implementation by @benwtrent and the two-phase search. |
| float expectedScore = VECTOR_SIMILARITY_FUNCTION.compare(targetVector, docVector); | ||
| Assert.assertEquals( | ||
| "Score does not match expected similarity for doc ord: " + scoreDoc.doc + ", id: " + id, | ||
| expectedScore, | ||
| scoreDoc.score, | ||
| 1e-5); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test that the results are sorted by exact distance.
Maybe we can also test that the result of the same query with oversample will be "at lease the same or better" than without oversample ? By "better" I mean we should have higher recall. But I'm not sure if it's deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, the docs should be sorted by ord, so my first point should be irrelevant.
| HitQueue queue = new HitQueue(k, false); | ||
| for (var leaf : reader.leaves()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have the access to IndexSearcher#getTaskExecutor and could use it to parallelize the work across segments(like we did earlier with some other query rewrites). But the HitQueue here isn't thread-safe. I don't know if using concurrency after making insertWithOverflow thread-safe would be really helpful since it looks like the added cost is cheap? or Maybe it will be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. In order to apply parallelism we need to use a per-segment queue, then merge it like in AbstractKnnVectorQuery.mergeLeafResults. I think the added latency is already low, but still want to try if it helps.
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
|
I think this is a nice overall approach, adding a new It's reminiscent of Lucene's existing I'm curious about your results here -- why is recall better for 1bit and 4bit than 7bit, when reranking? |
The graph is a bit confusing, but the dots are the oversample (from 1 to 5). If we compare the recall with the same oversample, then 7-bit is always better or same. The difference becomes smaller at higher oversample. E.g, at oversample=1 , 7 bit has 20% higher recall than 1-bit but at oversample=5 they are mostly the same. |
lucene/CHANGES.txt
Outdated
|
|
||
| * GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from | ||
| IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like my IDE automatically removes extra spaces. If there is objection I'll revert that in next rev, along with other feedbacks.
| to speed up computing the number of hits when possible. (Lu Xugang, Luca Cavanna, Adrien Grand) | ||
|
|
||
| * LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory` | ||
| * LUCENE-10422: Monitor Improvements: `Monitor` can use a custom `Directory` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a lot of unrelated changes, probably needs a merge from main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to my IDE automatically remove extra white spaces, will revert in next rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this problem is fixed for all files in main now, and any new trailing whitespaces will fail the CI build.
| import org.apache.lucene.index.IndexReader; | ||
|
|
||
| /** | ||
| * A Query that re-scores another Query with a DoubleValueSource function and cut-off the results at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds generic to any query, but rewrite always returns a KnnFloatVectorQuery ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createRewrittenQuery method would return a DocAndScoreQuery which is currently internal to KNN, but from the client/API point of view it's the same as any other Query. Moreover it only contains the docId and respective scores.
We can extract createRewrittenQuery into separate class for more reusability if needed. I can put a prerequisite PR if that makes sense.
|
Thanks for the explanations, @dungba88. I suppose the scenario you're trying to solve for, is when users want to change the matchset of a KnnVectorQuery using full-precision or other reranking. I'm open to this change if it's a valid requirement. My perception is that it should be solvable by plumbing vector query results into a rescorer, then combining top-K hits with other (lexical) hits. For e.g. is this also a problem for hybrid search in OpenSearch/Elasticsearch ? I suspect they might have independent queries for both with some way to combine results. |
Yes that's correct, @vigyasharma. We are using a hybrid search where KnnFloatVectorQuery and TermQuery (amongst others) are combined into a single BooleanQuery. Thus it is important to change the matchset of KnnFloatVectorQuery individually. For hybrid search in OpenSearch/Elastic Search, I'm wondering if @jmazanec15 and @benwtrent have any input. I'm having a feeling that it's quite common to combine lexical + KNN matching into a single BooleanQuery. |
|
kNN queries are completed in the rewrite phase, if any rescoring needs to be done, it should be done during that full phase. I would expect the experience to be:
kNN should be "just another query" and should be combinable with any other query. I realize this is a bit tricky as kNN is unique in that it effectively "collects" its results up-front. |
vigyasharma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for persisting on this @dungba88 , changes look good. I have a few suggestions but this looks almost ready!
| scoreDocs[i++] = topDoc; | ||
| } | ||
| TopDocs topDocs = | ||
| new TopDocs(new TotalHits(queue.size(), TotalHits.Relation.EQUAL_TO), scoreDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of setting this to configured n, should we retain the total no. of hits and relation from original query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no preference, but I can't find where Scorer or Weight would expose the relation of the original query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the original result count as totalHits, but keep the relation as EQUAL_TO, as it doesn't seem it was exposed anywhere. Usually this would only be visible on search(query, n) with TopDocsCollector.
I also realized that this totalHitsRelation value would not be used anywhere so it would not matter.
lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/TestRescoreTopNQuery.java
Outdated
Show resolved
Hide resolved
|
Sorry for spamming the replies! I should have gone to the Files changed tab, which allow sending all replies in the same message. |
Not sure Im following the discussion completely. It is common to combine lexical and k-NN in a boolean query, but I think there is a lot of variety in what/how users are implementing hybrid search, so flexibility is great to have. Like @benwtrent mentioned, result computation for k-NN is done upfront, but queries can also be used to re-score after the initial phase via the QueryRescorer (as @mikemccand mentioned awhile ago), so I dont think a separate rescorer is necessary. I also like the point around lazy iteration: "or atleast provide a scorer that iterates the kNN query results calculating the higher fidelity scores". For expensive re-scoring (I think multi-vector will be), this might be nice to have too for hybrid/boolean queries - I think this approach is take in FloatVectorSimilarityQuery. But, this can probably be taken for future consideration. |
Rescorer can be used, but IIUC Rescorer works only in the collection phase. After the first pass collection we will rescore the final results. This would not work if we combine semantic and lexical matching into a single Query, in that case we can only rescore the combined matches. Like @benwtrent mentioned, rescoring should be done in the rewrite phase. This will work for both the cases where semantic and lexical are combined or where semantic matching alone.
This is also handled by this PR, technically not a Scorer but a DoubleValueSources. User can use it to either use full-precision vectors, or even use another field for rescoring (amongst other use cases, as DoubleValueSources is extensible). A potential idea is to have 1-bit vector field for matching and another 4-bit or 7-bit vector field for rescoring. However the quantization cost of query vector even for scalar 7-bit is a bit high. We will tackle it as future optimization to this new Query. Also thanks @vigyasharma for approving the PR! Can you help to merge it if there is no objection? |
Yes, I'll merge in these changes tonight. Was waiting a day to allow people to give feedback on the latest revision if they want to. |
|
@dungba88 – Can this change go in 10.3 instead of waiting for 11.0? I didn't see anything blocking so I updated the changes entry. But I'm running into some merge issues while backporting, likely because of the If you would like to raise a separate PR for 10.3 backport (against branch_10x), and I can help review it. Or if this can only go in 11.0, then I'll update back the changes entry. Let me know. |
|
It should be possible to backport to 10.3. I'll raise a PR. Thanks @vigyasharma for merging! |
|
I put a backport PR to 10.3 here: #14860 |




Description
fixes #13564
Added a new Query which wraps around KnnFloatVectorQuery and does re-ranking for quantized index using full precision vectors. The idea is to first run KnnFloatVectorQuery with over-sampled k (x1.5, x2, x5, etc) and then re-rank the docs using full-precision (original, non-quantized) vector, and finally take top-k.
Questions:
targetinsideKnnFloatVectorQueryso that users don't need to pass the target twice? Currently it only exposes thegetTargetCopy()which requires array copy so it's inefficient, but I assume the intention is to encapsulate the array so that it won't be modified from outside?mlockfor preventing the quantized vectors from being swapped out, as loading fp vectors (although only a small set per query) means there will more pressure on RAM.Usage: